library(tidyverse)
Warning message:
In `[<-.data.frame`(`*tmp*`, is_list, value = list(`23` = "<S3: blob>")) :
replacement element 1 has 1 row to replace 0 rows
library(RSQLite)
library(dbplyr)
library(janitor)
library(lubridate)
library(datasets)
library(ggthemes)
library(gganimate)
library(modelr)
library(broom)
library(ggfortify)
library(infer)
library(MASS)
library(tseries)
library(forecast)
library(fable)
library(fabletools)
library(tsibble)
library(tsibbledata)
library(feasts)
# Connecting
conn <- dbConnect(SQLite(), "raw_data/FPA_FOD_20170508.sqlite")
# Pulling all the names of the tables in the database file
as.data.frame(dbListTables(conn))
# Making fires dataframe
fires <- tbl(conn, "Fires") %>% collect()
# EPSG worldwide geodetic parameter dataset system
spatial_ref <- tbl(conn, "spatial_ref_sys_all") %>% collect()
# National Wildfire Coordinating Group unit abbreviations
NWGG <- tbl(conn, "NWCG_UnitIDActive_20170109") %>% collect()
# Disconnect
dbDisconnect(conn)
fires_small <- fires %>%
select(NWCG_REPORTING_AGENCY, SOURCE_REPORTING_UNIT_NAME, FIRE_NAME,
FIRE_YEAR, DISCOVERY_DATE, DISCOVERY_DOY, DISCOVERY_TIME, CONT_DATE,
CONT_DOY, CONT_TIME, STAT_CAUSE_CODE, STAT_CAUSE_DESCR, FIRE_SIZE,
FIRE_SIZE_CLASS, LATITUDE, LONGITUDE, OWNER_CODE, OWNER_DESCR, STATE,
COUNTY, FIPS_CODE, FIPS_NAME, Shape)
fires_small <- clean_names(fires_small)
fires_small <- fires_small %>%
mutate(nwcg_reporting_agency = as.factor(nwcg_reporting_agency)) %>%
mutate(stat_cause_code = as.factor(stat_cause_code)) %>%
mutate(fire_size_class = as.factor(fire_size_class)) %>%
mutate(owner_descr = as.factor(owner_descr)) %>%
mutate(state = as.factor(state))
fires_small <- fires_small %>%
mutate(date_origin = as.Date(paste0(fire_year, "-01-01"))) %>%
mutate(discovery_date = as.Date(discovery_doy, origin = date_origin)) %>%
mutate(discovery_moy = month(discovery_date, label = TRUE)) %>%
select(-date_origin)
year_plot <- fires_small %>%
group_by(fire_year) %>%
summarise(num_fires =n())
`summarise()` ungrouping output (override with `.groups` argument)
year_plot %>%
ggplot +
aes(x = fire_year, y = num_fires) +
geom_point() +
ylim(0, 120000)
# geom_col(fill = "dark blue", col ="white") +
# geom_smooth(method = "lm", se = FALSE, colour = "red")
There is a lot of variation in the data between years. Visually it looks like a repeating pattern is occurring every 5 years or so with 4 peaks visible within this reporting period. Having looked at the historic weather for that date range these peaks seems to coincide with recorded heatwaves in 2000, 2006 and 2011.(1)
https://en.wikipedia.org/wiki/List_of_heat_waves
model <- lm(formula = num_fires ~ fire_year, data = year_plot)
summary(model)
Call:
lm(formula = num_fires ~ fire_year, data = year_plot)
Residuals:
Min 1Q Median 3Q Max
-16835 -8688 -2049 9226 34793
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -609543.8 756667.9 -0.806 0.429
fire_year 343.3 377.7 0.909 0.373
Residual standard error: 12810 on 22 degrees of freedom
Multiple R-squared: 0.03621, Adjusted R-squared: -0.007601
F-statistic: 0.8265 on 1 and 22 DF, p-value: 0.3731
tidy(model)
clean_names(glance(model))
The R Squared is quite low as expected from the widely spread plot and from the high p value we already know the model is not a good fit
autoplot(model)
`arrange_()` is deprecated as of dplyr 0.7.0.
Please use `arrange()` instead.
See vignette('programming') for more help
This warning is displayed once every 8 hours.
Call `lifecycle::last_warnings()` to see where this warning was generated.
The diagnostic plots agree that the model isn’t a great fit and there is likely to be a curve in the best fit line. As this is time series data we already know this is the case.
year_plot %>%
add_predictions(model) %>%
add_residuals(model)
year_plot
year_plot %>%
ggplot(aes(x = fire_year)) +
geom_point(aes(y = num_fires)) +
geom_abline(
intercept = model$coefficients[1],
slope = model$coefficients[2],
col = "red"
) +
ylim(0, 120000)
NA
The plotted best fit line does show a slight increase, but as the P value is far too high I can not accept this model as a true representation of the occuring trend
bootstrap_distribution_slope <- year_plot %>%
specify(formula = num_fires ~ fire_year) %>%
generate(reps = 10000, type = "bootstrap") %>%
calculate(stat = "slope")
slope_ci95 <- bootstrap_distribution_slope %>%
get_ci(level = 0.95, type = "percentile")
slope_ci95
bootstrap_distribution_slope %>%
visualise(bins = 30) +
shade_ci(endpoints = slope_ci95)
clean_names(tidy(model, conf.int = TRUE, conf.level = 0.95))
As 0 occurs within the 95% confidence intervals of -283 to +1075 it reinforces the fact that this model can not be used to explain if there are any positive or negative trends that are occurring in this data. It will be more use to use a model that is designed for time series and seasonal variations. For that I shall be also requiring more data points so I will now use monthly data and not yearly.
fires_small %>%
mutate(year_month = make_date(fire_year, discovery_moy)) %>%
group_by(year_month) %>%
summarise(num_fires = n()) %>%
ggplot +
aes(x = year_month, y = num_fires) +
geom_line(col = "dark blue")
`summarise()` ungrouping output (override with `.groups` argument)
Peaks are still shown to be occurring in the summers. The 2006 heatwave is especially visible.
monthly <- fires_small %>%
mutate(year_month = make_date(fire_year, discovery_moy)) %>%
group_by(year_month) %>%
summarise(num_fires = n())
`summarise()` ungrouping output (override with `.groups` argument)
write_csv(monthly, path = "clean_data/monthly.csv")
monthly
# Taking logs of data to smooth the volatility of the data
log_monthly <- log(monthly$num_fires)
# Autocorrelation plot to look to stationarity
acf(log_monthly)
pacf(log_monthly)
Definitely some repeating patterns in the lag plots which is likely to be some kind of seasonal variation.
adf.test(log_monthly)
p-value smaller than printed p-value
Augmented Dickey-Fuller Test
data: log_monthly
Dickey-Fuller = -8.3637, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary
Agrees with a p-value of 0.01 that the hypothesis of having some kind of repeating pattern is correct and the null hypothesis of being stationary is not proven.
# Decomposing the time series data to examine any seasonal trends
arima_test <- ts(log_monthly, start = c(1992,01), frequency = 12)
components <- decompose(arima_test)
components
$x
Jan Feb Mar Apr May Jun Jul Aug
1992 8.143227 8.806724 9.127285 8.941415 9.070388 8.903136 8.915298 8.938138
1993 7.185387 8.472823 8.578288 8.821437 8.708144 8.843759 9.058703 8.960725
1994 8.090709 8.601534 9.350450 9.167537 8.837971 8.740497 9.369052 9.105979
1995 7.256297 8.708309 9.224342 9.209340 8.665613 8.504108 9.030855 8.870803
1996 8.081166 9.263976 9.172950 9.203718 8.804325 8.700847 9.174610 9.122274
1997 7.848934 8.214736 8.949495 9.236106 8.820995 8.372630 8.917445 8.714239
1998 7.087574 7.947679 8.777093 8.969542 8.466952 8.852522 9.247443 9.085684
1999 8.179760 8.692154 9.287579 9.276128 8.735686 8.714403 8.859363 9.422302
2000 8.549273 9.218507 9.363233 8.996776 8.994793 9.010669 9.375007 9.513034
2001 8.583543 8.814182 8.802673 9.169831 9.137662 8.722254 9.197356 9.007857
2002 8.183956 9.183791 9.147081 9.061028 8.915567 8.835065 9.274910 9.049115
2003 8.363342 7.975908 8.776321 9.198369 8.564458 8.573952 9.241839 9.116469
2004 8.196712 8.290794 9.458528 9.334326 8.620291 8.832296 9.060680 8.837246
2005 8.357728 8.513185 9.228082 9.450144 8.734077 8.720297 9.150697 9.050641
2006 9.169102 8.970305 9.847605 9.687009 9.056606 9.257701 9.657011 9.313799
2007 8.121183 8.983189 9.567665 9.258082 9.255027 9.016027 9.408781 9.171911
2008 8.640295 8.847360 9.209240 9.289706 8.885718 9.197356 9.233764 9.047586
2009 8.656955 9.178127 9.428431 9.303557 8.684739 8.652772 9.265775 8.819665
2010 7.827640 7.905442 9.117677 9.385134 8.539150 8.637107 9.113499 9.021719
2011 8.570355 9.216223 9.182249 9.056839 8.818482 9.156518 9.268704 9.343997
2012 8.318254 8.298540 8.993552 8.958797 8.674368 8.850661 9.395242 9.029777
2013 8.046229 8.091321 9.063347 8.907342 8.906935 8.632662 9.018090 9.022926
2014 8.679482 8.353497 9.121181 9.211739 8.752265 8.424200 9.058470 8.831128
2015 8.301770 8.594339 8.914223 9.157889 9.000483 8.745284 9.062188 9.056606
Sep Oct Nov Dec
1992 8.226573 8.208219 7.244228 6.993933
1993 8.497603 8.343316 8.087333 7.634337
1994 8.541300 7.914252 8.005701 7.271704
1995 8.883086 8.342602 7.999007 8.093157
1996 8.278936 8.055792 7.625595 7.380256
1997 8.689633 8.235095 7.368970 7.152269
1998 8.796944 8.534837 8.343316 8.052615
1999 9.148252 8.638348 8.919854 8.324336
2000 8.799963 8.885718 8.170186 7.811973
2001 8.494129 8.744010 9.289336 7.938802
2002 8.514189 7.579168 7.656337 7.486053
2003 8.423542 8.434898 8.074026 7.774015
2004 8.158516 7.653020 7.634337 7.584265
2005 8.927712 8.645059 8.981304 8.520587
2006 8.501064 8.346879 8.073091 8.130648
2007 8.798757 8.480114 8.353261 8.089176
2008 8.346168 8.417594 8.476163 7.855157
2009 8.443977 7.556428 7.990577 6.898715
2010 8.939843 9.066932 8.507951 8.363809
2011 8.996157 8.455743 8.209308 7.698936
2012 8.533067 8.206856 8.467162 7.656810
2013 8.472614 7.853993 8.240385 7.430707
2014 8.077137 8.028455 8.269757 7.580189
2015 8.625330 8.711608 8.105911 7.370860
$seasonal
Jan Feb Mar Apr May Jun
1992 -0.45855962 -0.02250662 0.50809392 0.53902883 0.15698768 0.12923426
1993 -0.45855962 -0.02250662 0.50809392 0.53902883 0.15698768 0.12923426
1994 -0.45855962 -0.02250662 0.50809392 0.53902883 0.15698768 0.12923426
1995 -0.45855962 -0.02250662 0.50809392 0.53902883 0.15698768 0.12923426
1996 -0.45855962 -0.02250662 0.50809392 0.53902883 0.15698768 0.12923426
1997 -0.45855962 -0.02250662 0.50809392 0.53902883 0.15698768 0.12923426
1998 -0.45855962 -0.02250662 0.50809392 0.53902883 0.15698768 0.12923426
1999 -0.45855962 -0.02250662 0.50809392 0.53902883 0.15698768 0.12923426
2000 -0.45855962 -0.02250662 0.50809392 0.53902883 0.15698768 0.12923426
2001 -0.45855962 -0.02250662 0.50809392 0.53902883 0.15698768 0.12923426
2002 -0.45855962 -0.02250662 0.50809392 0.53902883 0.15698768 0.12923426
2003 -0.45855962 -0.02250662 0.50809392 0.53902883 0.15698768 0.12923426
2004 -0.45855962 -0.02250662 0.50809392 0.53902883 0.15698768 0.12923426
2005 -0.45855962 -0.02250662 0.50809392 0.53902883 0.15698768 0.12923426
2006 -0.45855962 -0.02250662 0.50809392 0.53902883 0.15698768 0.12923426
2007 -0.45855962 -0.02250662 0.50809392 0.53902883 0.15698768 0.12923426
2008 -0.45855962 -0.02250662 0.50809392 0.53902883 0.15698768 0.12923426
2009 -0.45855962 -0.02250662 0.50809392 0.53902883 0.15698768 0.12923426
2010 -0.45855962 -0.02250662 0.50809392 0.53902883 0.15698768 0.12923426
2011 -0.45855962 -0.02250662 0.50809392 0.53902883 0.15698768 0.12923426
2012 -0.45855962 -0.02250662 0.50809392 0.53902883 0.15698768 0.12923426
2013 -0.45855962 -0.02250662 0.50809392 0.53902883 0.15698768 0.12923426
2014 -0.45855962 -0.02250662 0.50809392 0.53902883 0.15698768 0.12923426
2015 -0.45855962 -0.02250662 0.50809392 0.53902883 0.15698768 0.12923426
Jul Aug Sep Oct Nov Dec
1992 0.54027865 0.41448138 -0.05900239 -0.35734902 -0.47240336 -0.91828371
1993 0.54027865 0.41448138 -0.05900239 -0.35734902 -0.47240336 -0.91828371
1994 0.54027865 0.41448138 -0.05900239 -0.35734902 -0.47240336 -0.91828371
1995 0.54027865 0.41448138 -0.05900239 -0.35734902 -0.47240336 -0.91828371
1996 0.54027865 0.41448138 -0.05900239 -0.35734902 -0.47240336 -0.91828371
1997 0.54027865 0.41448138 -0.05900239 -0.35734902 -0.47240336 -0.91828371
1998 0.54027865 0.41448138 -0.05900239 -0.35734902 -0.47240336 -0.91828371
1999 0.54027865 0.41448138 -0.05900239 -0.35734902 -0.47240336 -0.91828371
2000 0.54027865 0.41448138 -0.05900239 -0.35734902 -0.47240336 -0.91828371
2001 0.54027865 0.41448138 -0.05900239 -0.35734902 -0.47240336 -0.91828371
2002 0.54027865 0.41448138 -0.05900239 -0.35734902 -0.47240336 -0.91828371
2003 0.54027865 0.41448138 -0.05900239 -0.35734902 -0.47240336 -0.91828371
2004 0.54027865 0.41448138 -0.05900239 -0.35734902 -0.47240336 -0.91828371
2005 0.54027865 0.41448138 -0.05900239 -0.35734902 -0.47240336 -0.91828371
2006 0.54027865 0.41448138 -0.05900239 -0.35734902 -0.47240336 -0.91828371
2007 0.54027865 0.41448138 -0.05900239 -0.35734902 -0.47240336 -0.91828371
2008 0.54027865 0.41448138 -0.05900239 -0.35734902 -0.47240336 -0.91828371
2009 0.54027865 0.41448138 -0.05900239 -0.35734902 -0.47240336 -0.91828371
2010 0.54027865 0.41448138 -0.05900239 -0.35734902 -0.47240336 -0.91828371
2011 0.54027865 0.41448138 -0.05900239 -0.35734902 -0.47240336 -0.91828371
2012 0.54027865 0.41448138 -0.05900239 -0.35734902 -0.47240336 -0.91828371
2013 0.54027865 0.41448138 -0.05900239 -0.35734902 -0.47240336 -0.91828371
2014 0.54027865 0.41448138 -0.05900239 -0.35734902 -0.47240336 -0.91828371
2015 0.54027865 0.41448138 -0.05900239 -0.35734902 -0.47240336 -0.91828371
$trend
Jan Feb Mar Apr May Jun Jul Aug
1992 NA NA NA NA NA NA 8.419970 8.366148
1993 8.267328 8.274244 8.286478 8.303400 8.344158 8.405971 8.470376 8.513461
1994 8.627157 8.646141 8.654014 8.637957 8.616678 8.598167 8.548290 8.517972
1995 8.467242 8.443351 8.447793 8.479882 8.497451 8.531399 8.599996 8.657518
1996 8.709864 8.726332 8.711637 8.674514 8.647004 8.601741 8.562361 8.508966
1997 8.412648 8.384931 8.385042 8.409625 8.406404 8.386212 8.344989 8.302138
1998 8.278668 8.307894 8.327843 8.344803 8.397890 8.476002 8.559024 8.635552
1999 8.729376 8.727232 8.755895 8.774846 8.803181 8.838526 8.865244 8.902571
2000 8.975294 9.000560 8.989828 8.985623 8.964694 8.912109 8.892189 8.876770
2001 8.808100 8.779649 8.745857 8.727209 8.767936 8.819852 8.808487 8.807238
2002 8.836396 8.841347 8.843902 8.796202 8.679626 8.592719 8.581329 8.538475
2003 8.416299 8.417727 8.416757 8.448635 8.501694 8.531097 8.536152 8.542329
2004 8.642263 8.623080 8.600403 8.556782 8.505884 8.479657 8.478460 8.494435
2005 8.498049 8.510691 8.551632 8.625017 8.722476 8.817613 8.890433 8.943287
2006 9.126457 9.158518 9.151706 9.121505 9.071239 9.017149 8.957238 8.914112
2007 8.841629 8.825374 8.831866 8.849822 8.867047 8.876993 8.896894 8.912864
2008 8.857014 8.844541 8.820503 8.799040 8.801556 8.796926 8.787870 8.802346
2009 8.774752 8.766589 8.761167 8.729361 8.673246 8.613162 8.538755 8.451172
2010 8.359262 8.361336 8.390416 8.474015 8.558510 8.641113 8.733105 8.818667
2011 8.924335 8.944230 8.960005 8.936885 8.898975 8.858829 8.820622 8.771881
2012 8.677524 8.669704 8.637316 8.607650 8.608024 8.617012 8.603923 8.583954
2013 8.562348 8.546348 8.543543 8.526322 8.502170 8.483300 8.500265 8.537574
2014 8.550106 8.543797 8.519327 8.510118 8.518611 8.526063 8.516554 8.510851
2015 8.546749 8.556298 8.588535 8.639841 8.661479 8.645930 NA NA
Sep Oct Nov Dec
1992 8.329360 8.301486 8.281394 8.263826
1993 8.550997 8.597592 8.617422 8.618529
1994 8.517166 8.513654 8.508214 8.491183
1995 8.678530 8.676154 8.681700 8.695677
1996 8.455937 8.447976 8.450020 8.437039
1997 8.283827 8.265537 8.239678 8.244922
1998 8.687842 8.721887 8.745858 8.751301
1999 8.927655 8.919168 8.918324 8.941465
2000 8.836566 8.820420 8.833584 8.827519
2001 8.836989 8.846805 8.833018 8.828464
2002 8.472699 8.462973 8.454066 8.428557
2003 8.583875 8.617965 8.625956 8.639047
2004 8.494100 8.489323 8.498890 8.498965
2005 8.988147 9.023830 9.047138 9.082969
2006 8.902985 8.873449 8.863844 8.862042
2007 8.892270 8.878654 8.864583 8.856751
2008 8.825261 8.834971 8.827174 8.796109
2009 8.385195 8.375646 8.372979 8.366260
2010 8.875974 8.864985 8.862945 8.896226
2011 8.725782 8.713834 8.703744 8.684995
2012 8.578228 8.578993 8.586539 8.587146
2013 8.550908 8.566001 8.572240 8.557109
2014 8.512263 8.501396 8.509494 8.533215
2015 NA NA NA NA
$random
Jan Feb Mar Apr May
1992 NA NA NA NA NA
1993 -0.6233809577 0.2210859405 -0.2162835416 -0.0209912924 0.2069981699
1994 -0.0778891050 -0.0220999711 0.1883422030 -0.0094486294 0.0643057584
1995 -0.7523847346 0.2874648696 0.2684548675 0.1904289896 0.0111744696
1996 -0.1701387014 0.5601507230 -0.0467807958 -0.0098239152 0.0003330778
1997 -0.1051545539 -0.1476887992 0.0563588626 0.2874512506 0.2576032032
1998 -0.7325343120 -0.3377092540 -0.0588436413 0.0857099361 -0.0879257640
1999 -0.0910557088 -0.0125705541 0.0235900729 -0.0377467128 -0.2244831524
2000 0.0325386191 0.2404537960 -0.1346890460 -0.5278762981 -0.1268889030
2001 0.2340021678 0.0570395770 -0.4512778124 -0.0964071698 0.2127385062
2002 -0.1938809368 0.3649509021 -0.2049145287 -0.2742032314 0.0789532160
2003 0.4056031774 -0.4193124026 -0.1485293225 0.2107048520 -0.0942237383
2004 0.0130089330 -0.3097795052 0.3500303370 0.2385150365 -0.0425804086
2005 0.3182391310 0.0250007048 0.1683557432 0.2860983870 -0.1453861909
2006 0.5012041268 -0.1657069193 0.1878044761 0.0264751592 -0.1716202969
2007 -0.2618864082 0.1803210569 0.2277049440 -0.1307680274 0.2309924287
2008 0.2418410992 0.0253251950 -0.1193573635 -0.0483632270 -0.0728262247
2009 0.3407631199 0.4340449334 0.1591698232 0.0351676969 -0.1454944827
2010 -0.0730631732 -0.4333881389 0.2191663716 0.3720896909 -0.1763473582
2011 0.1045792522 0.2944994489 -0.2858492688 -0.4190743966 -0.2374808430
2012 0.0992901576 -0.3486575784 -0.1518581830 -0.1878816359 -0.0906435908
2013 -0.0575592341 -0.4325199112 0.0117099464 -0.1580092312 0.2477773517
2014 0.5879358157 -0.1677931192 0.0937602986 0.1625925185 0.0766665120
2015 0.2135807753 0.0605475553 -0.1824058934 -0.0209812010 0.1820168093
Jun Jul Aug Sep Oct
1992 NA -0.0449509633 0.1575087342 -0.0437845200 0.2640819290
1993 0.3085539972 0.0480481706 0.0327820440 0.0056074889 0.1030732247
1994 0.0130954944 0.2804833091 0.1735260613 0.0831362965 -0.2420523266
1995 -0.1565256653 -0.1094199513 -0.2011963255 0.2635580823 0.0237964250
1996 -0.0301283742 0.0719701407 0.1988264146 -0.1179987392 -0.0348344932
1997 -0.1428161289 0.0321772594 -0.0023803391 0.4648078450 0.3269074670
1998 0.2472854583 0.1481404093 0.0356503782 0.1681041910 0.1702988697
1999 -0.2533564798 -0.5461588727 0.1052488685 0.2795995836 0.0765297812
2000 -0.0306745492 -0.0574605672 0.2217828960 0.0223992451 0.4226463621
2001 -0.2268317961 -0.1514090624 -0.2138622447 -0.2838568677 0.2545536127
2002 0.1131112304 0.1533021643 0.0961580124 0.1004925057 -0.5264558037
2003 -0.0863793198 0.1654082837 0.1596583410 -0.1013309156 0.1742818811
2004 0.2234043597 0.0419409644 -0.0716710035 -0.2765810114 -0.4789540583
2005 -0.2265497504 -0.2800155713 -0.3071277015 -0.0014328522 -0.0214226481
2006 0.1113174957 0.1594943028 -0.0147943085 -0.3429184786 -0.1692202947
2007 0.0098004143 -0.0283915317 -0.1554343610 -0.0345114221 -0.0411904481
2008 0.2711958802 -0.0943844490 -0.1692410666 -0.4200907602 -0.0600280180
2009 -0.0896235184 0.1867412670 -0.0459878178 0.1177843980 -0.4618690336
2010 -0.1332398421 -0.1598845677 -0.2114296623 0.1228717981 0.5592954861
2011 0.1684543318 -0.0921967996 0.1576346081 0.3293774120 0.0992582136
2012 0.1044142300 0.2510404874 0.0313415740 0.0138405581 -0.0147870883
2013 0.0201278835 -0.0224538500 0.0708700702 -0.0192916141 -0.3546590289
2014 -0.2310973106 0.0016379769 -0.0942046225 -0.3761236738 -0.1155914613
2015 -0.0298794915 NA NA NA NA
Nov Dec
1992 -0.5647630400 -0.3516096980
1993 -0.0576857033 -0.0659079015
1994 -0.0301097753 -0.3011952612
1995 -0.2102891499 0.3157635609
1996 -0.3520215661 -0.1384993286
1997 -0.3983046978 -0.1743696155
1998 0.0698607813 0.2195977677
1999 0.4739337289 0.3011555153
2000 -0.1909948168 -0.0972622783
2001 0.9287217703 0.0286215357
2002 -0.3253253092 -0.0242202677
2003 -0.0795267746 0.0532517319
2004 -0.3921497404 0.0035837222
2005 0.4065695347 0.3559022915
2006 -0.3183495899 0.1868897500
2007 -0.0389185632 0.1507084736
2008 0.1213922737 -0.0226681149
2009 0.0900014144 -0.5492616459
2010 0.1174087339 0.3858664418
2011 -0.0220324164 -0.0677754767
2012 0.3530267510 -0.0120520989
2013 0.1405486739 -0.2081184952
2014 0.2326660300 -0.0347420587
2015 NA NA
$figure
[1] -0.45855962 -0.02250662 0.50809392 0.53902883 0.15698768 0.12923426
[7] 0.54027865 0.41448138 -0.05900239 -0.35734902 -0.47240336 -0.91828371
$type
[1] "additive"
attr(,"class")
[1] "decomposed.ts"
Definitely some seasonal trend in that data, but let’s have a look at a plot to make sure
plot(components)
The repeating seasonal trends can be seen very clearly in this plot.
# Using auto.arima to look at the best ARIMA configuration
fit_log_monthly <- auto.arima(arima_test, trace = TRUE, test = "kpss", ic = "bic")
Fitting models using approximations to speed things up...
ARIMA(2,0,2)(1,1,1)[12] with drift : Inf
ARIMA(0,0,0)(0,1,0)[12] with drift : 386.3822
ARIMA(1,0,0)(1,1,0)[12] with drift : 210.0305
ARIMA(0,0,1)(0,1,1)[12] with drift : 206.1551
ARIMA(0,0,0)(0,1,0)[12] : 380.8388
ARIMA(0,0,1)(0,1,0)[12] with drift : 332.9606
ARIMA(0,0,1)(1,1,1)[12] with drift : 199.5213
ARIMA(0,0,1)(1,1,0)[12] with drift : 244.7824
ARIMA(0,0,1)(2,1,1)[12] with drift : 204.6073
ARIMA(0,0,1)(1,1,2)[12] with drift : 194.5689
ARIMA(0,0,1)(0,1,2)[12] with drift : 211.4944
ARIMA(0,0,1)(2,1,2)[12] with drift : Inf
ARIMA(0,0,0)(1,1,2)[12] with drift : 249.2132
ARIMA(1,0,1)(1,1,2)[12] with drift : 149.1018
ARIMA(1,0,1)(0,1,2)[12] with drift : 161.5843
ARIMA(1,0,1)(1,1,1)[12] with drift : 146.0692
ARIMA(1,0,1)(0,1,1)[12] with drift : 157.8235
ARIMA(1,0,1)(1,1,0)[12] with drift : 197.9303
ARIMA(1,0,1)(2,1,1)[12] with drift : 161.8091
ARIMA(1,0,1)(0,1,0)[12] with drift : 292.8607
ARIMA(1,0,1)(2,1,0)[12] with drift : 188.0216
ARIMA(1,0,1)(2,1,2)[12] with drift : 153.8294
ARIMA(1,0,0)(1,1,1)[12] with drift : 153.6335
ARIMA(2,0,1)(1,1,1)[12] with drift : 147.4259
ARIMA(1,0,2)(1,1,1)[12] with drift : 151.6193
ARIMA(0,0,0)(1,1,1)[12] with drift : 256.4293
ARIMA(0,0,2)(1,1,1)[12] with drift : 183.564
ARIMA(2,0,0)(1,1,1)[12] with drift : 141.9116
ARIMA(2,0,0)(0,1,1)[12] with drift : 160.3183
ARIMA(2,0,0)(1,1,0)[12] with drift : 201.1907
ARIMA(2,0,0)(2,1,1)[12] with drift : 165.2805
ARIMA(2,0,0)(1,1,2)[12] with drift : Inf
ARIMA(2,0,0)(0,1,0)[12] with drift : 294.3017
ARIMA(2,0,0)(0,1,2)[12] with drift : 163.7419
ARIMA(2,0,0)(2,1,0)[12] with drift : 189.6952
ARIMA(2,0,0)(2,1,2)[12] with drift : 160.4628
ARIMA(3,0,0)(1,1,1)[12] with drift : 140.895
ARIMA(3,0,0)(0,1,1)[12] with drift : 166.2635
ARIMA(3,0,0)(1,1,0)[12] with drift : 202.3331
ARIMA(3,0,0)(2,1,1)[12] with drift : 167.8427
ARIMA(3,0,0)(1,1,2)[12] with drift : 146.502
ARIMA(3,0,0)(0,1,0)[12] with drift : 299.1173
ARIMA(3,0,0)(0,1,2)[12] with drift : 169.2991
ARIMA(3,0,0)(2,1,0)[12] with drift : 190.1357
ARIMA(3,0,0)(2,1,2)[12] with drift : 164.0628
ARIMA(4,0,0)(1,1,1)[12] with drift : 148.044
ARIMA(3,0,1)(1,1,1)[12] with drift : Inf
ARIMA(4,0,1)(1,1,1)[12] with drift : 152.986
ARIMA(3,0,0)(1,1,1)[12] : 135.3069
ARIMA(3,0,0)(0,1,1)[12] : 161.3958
ARIMA(3,0,0)(1,1,0)[12] : 196.7164
ARIMA(3,0,0)(2,1,1)[12] : 162.2665
ARIMA(3,0,0)(1,1,2)[12] : 140.8981
ARIMA(3,0,0)(0,1,0)[12] : 293.5803
ARIMA(3,0,0)(0,1,2)[12] : 164.2536
ARIMA(3,0,0)(2,1,0)[12] : 184.517
ARIMA(3,0,0)(2,1,2)[12] : 158.4784
ARIMA(2,0,0)(1,1,1)[12] : 136.5386
ARIMA(4,0,0)(1,1,1)[12] : 142.4368
ARIMA(3,0,1)(1,1,1)[12] : 140.8894
ARIMA(2,0,1)(1,1,1)[12] : 142.018
ARIMA(4,0,1)(1,1,1)[12] : 147.3867
Now re-fitting the best model(s) without approximations...
ARIMA(3,0,0)(1,1,1)[12] : 129.5424
Best model: ARIMA(3,0,0)(1,1,1)[12]
fit_log_monthly
Series: arima_test
ARIMA(3,0,0)(1,1,1)[12]
Coefficients:
ar1 ar2 ar3 sar1 sma1
0.4104 0.1781 0.1091 -0.1107 -0.8207
s.e. 0.0600 0.0644 0.0610 0.0734 0.0515
sigma^2 estimated as 0.07962: log likelihood=-47.91
AIC=107.82 AICc=108.13 BIC=129.54
auto.arima has chosen the best model with the lowest BIC.
confint(fit_log_monthly)
2.5 % 97.5 %
ar1 0.29277203 0.52793840
ar2 0.05192296 0.30419421
ar3 -0.01051291 0.22875135
sar1 -0.25453068 0.03310203
sma1 -0.92163837 -0.71985077
monthly_num_fires <- monthly$num_fires
acf(monthly_num_fires)
pacf(monthly_num_fires)
adf.test(monthly_num_fires)
p-value smaller than printed p-value
Augmented Dickey-Fuller Test
data: monthly_num_fires
Dickey-Fuller = -7.9715, Lag order = 6, p-value = 0.01
alternative hypothesis: stationary
monthly_arima <- ts(monthly_num_fires, start = c(1991, 01), frequency = 12)
components_monthly <- decompose(monthly_arima)
components_monthly
$x
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1991 3440 6679 9203 7642 8694 7355 7445 7617 3739 3671 1400 1090
1992 1320 4783 5315 6778 6052 6931 8593 7791 4903 4202 3253 2068
1993 3264 5440 11504 9581 6891 6251 11720 9009 5122 2736 2998 1439
1994 1417 6053 10141 9990 5800 4935 8357 7121 7209 4199 2978 3272
1995 3233 10551 9633 9934 6663 6008 9649 9157 3940 3152 2050 1604
1996 2563 3695 7704 10261 6775 4327 7461 6089 5941 3771 1586 1277
1997 1197 2829 6484 7860 4755 6992 10378 8828 6614 5089 4202 3142
1998 3568 5956 10803 10680 6221 6090 7040 12361 9398 5644 7479 4123
1999 5163 10082 11652 8077 8061 8190 11790 13535 6634 7228 3534 2470
2000 5343 6729 6652 9603 9299 6138 9871 8167 4886 6273 10822 2804
2001 3583 9738 9387 8613 7447 6871 10667 8511 4985 1957 2114 1783
2002 4287 2910 6479 9881 5242 5292 10320 9104 4553 4605 3210 2378
2003 3629 3987 12817 11320 5543 6852 8610 6886 3493 2107 2068 1967
2004 4263 4980 10179 12710 6211 6126 9421 8524 7538 5682 7953 5017
2005 9596 7866 18913 16107 8575 10485 15631 11090 4920 4217 3207 3397
2006 3365 7968 14295 10489 10457 8234 12195 9623 6626 4818 4244 3259
2007 5655 6956 9989 10826 7228 9871 10237 8498 4214 4526 4799 2579
2008 5750 9683 12437 10977 5912 5726 10570 6766 4647 1913 2953 991
2009 2509 2712 9115 11910 5111 5637 9077 8281 7630 8664 4954 4289
2010 5273 10059 9723 8577 6758 9476 10601 11430 8072 4702 3675 2206
2011 4098 4018 8051 7776 5851 6979 12031 8348 5080 3666 4756 2115
2012 3122 3266 8633 7386 7383 5612 8251 8291 4782 2576 3791 1687
2013 5881 4245 9147 10014 6325 4556 8591 6844 3220 3067 3904 1959
2014 4031 5401 7437 9489 8107 6281 8623 8575 5571 6073 3314 1589
$seasonal
Jan Feb Mar Apr May Jun
1991 -2522.1208 -447.8599 3311.5640 3406.0242 260.2923 133.8774
1992 -2522.1208 -447.8599 3311.5640 3406.0242 260.2923 133.8774
1993 -2522.1208 -447.8599 3311.5640 3406.0242 260.2923 133.8774
1994 -2522.1208 -447.8599 3311.5640 3406.0242 260.2923 133.8774
1995 -2522.1208 -447.8599 3311.5640 3406.0242 260.2923 133.8774
1996 -2522.1208 -447.8599 3311.5640 3406.0242 260.2923 133.8774
1997 -2522.1208 -447.8599 3311.5640 3406.0242 260.2923 133.8774
1998 -2522.1208 -447.8599 3311.5640 3406.0242 260.2923 133.8774
1999 -2522.1208 -447.8599 3311.5640 3406.0242 260.2923 133.8774
2000 -2522.1208 -447.8599 3311.5640 3406.0242 260.2923 133.8774
2001 -2522.1208 -447.8599 3311.5640 3406.0242 260.2923 133.8774
2002 -2522.1208 -447.8599 3311.5640 3406.0242 260.2923 133.8774
2003 -2522.1208 -447.8599 3311.5640 3406.0242 260.2923 133.8774
2004 -2522.1208 -447.8599 3311.5640 3406.0242 260.2923 133.8774
2005 -2522.1208 -447.8599 3311.5640 3406.0242 260.2923 133.8774
2006 -2522.1208 -447.8599 3311.5640 3406.0242 260.2923 133.8774
2007 -2522.1208 -447.8599 3311.5640 3406.0242 260.2923 133.8774
2008 -2522.1208 -447.8599 3311.5640 3406.0242 260.2923 133.8774
2009 -2522.1208 -447.8599 3311.5640 3406.0242 260.2923 133.8774
2010 -2522.1208 -447.8599 3311.5640 3406.0242 260.2923 133.8774
2011 -2522.1208 -447.8599 3311.5640 3406.0242 260.2923 133.8774
2012 -2522.1208 -447.8599 3311.5640 3406.0242 260.2923 133.8774
2013 -2522.1208 -447.8599 3311.5640 3406.0242 260.2923 133.8774
2014 -2522.1208 -447.8599 3311.5640 3406.0242 260.2923 133.8774
Jul Aug Sep Oct Nov Dec
1991 3400.9897 2244.1908 -955.7295 -2246.3545 -2532.7675 -4052.1063
1992 3400.9897 2244.1908 -955.7295 -2246.3545 -2532.7675 -4052.1063
1993 3400.9897 2244.1908 -955.7295 -2246.3545 -2532.7675 -4052.1063
1994 3400.9897 2244.1908 -955.7295 -2246.3545 -2532.7675 -4052.1063
1995 3400.9897 2244.1908 -955.7295 -2246.3545 -2532.7675 -4052.1063
1996 3400.9897 2244.1908 -955.7295 -2246.3545 -2532.7675 -4052.1063
1997 3400.9897 2244.1908 -955.7295 -2246.3545 -2532.7675 -4052.1063
1998 3400.9897 2244.1908 -955.7295 -2246.3545 -2532.7675 -4052.1063
1999 3400.9897 2244.1908 -955.7295 -2246.3545 -2532.7675 -4052.1063
2000 3400.9897 2244.1908 -955.7295 -2246.3545 -2532.7675 -4052.1063
2001 3400.9897 2244.1908 -955.7295 -2246.3545 -2532.7675 -4052.1063
2002 3400.9897 2244.1908 -955.7295 -2246.3545 -2532.7675 -4052.1063
2003 3400.9897 2244.1908 -955.7295 -2246.3545 -2532.7675 -4052.1063
2004 3400.9897 2244.1908 -955.7295 -2246.3545 -2532.7675 -4052.1063
2005 3400.9897 2244.1908 -955.7295 -2246.3545 -2532.7675 -4052.1063
2006 3400.9897 2244.1908 -955.7295 -2246.3545 -2532.7675 -4052.1063
2007 3400.9897 2244.1908 -955.7295 -2246.3545 -2532.7675 -4052.1063
2008 3400.9897 2244.1908 -955.7295 -2246.3545 -2532.7675 -4052.1063
2009 3400.9897 2244.1908 -955.7295 -2246.3545 -2532.7675 -4052.1063
2010 3400.9897 2244.1908 -955.7295 -2246.3545 -2532.7675 -4052.1063
2011 3400.9897 2244.1908 -955.7295 -2246.3545 -2532.7675 -4052.1063
2012 3400.9897 2244.1908 -955.7295 -2246.3545 -2532.7675 -4052.1063
2013 3400.9897 2244.1908 -955.7295 -2246.3545 -2532.7675 -4052.1063
2014 3400.9897 2244.1908 -955.7295 -2246.3545 -2532.7675 -4052.1063
$trend
Jan Feb Mar Apr May Jun Jul
1991 NA NA NA NA NA NA 5576.250
1992 4726.250 4781.333 4837.083 4907.708 5007.042 5125.000 5246.750
1993 6275.375 6456.417 6516.292 6464.333 6392.625 6355.792 6252.625
1994 5806.542 5587.750 5596.042 5743.958 5804.083 5879.625 6031.667
1995 6650.333 6789.000 6737.625 6557.792 6475.500 6367.333 6269.917
1996 5315.250 5096.250 5051.792 5160.958 5167.417 5134.458 5063.917
1997 4808.375 5044.042 5186.208 5269.167 5433.083 5619.792 5796.292
1998 6658.500 6666.625 6929.833 7068.958 7228.625 7406.042 7513.375
1999 8303.750 8550.583 8484.333 8435.167 8336.792 8103.542 8042.167
2000 7332.958 7029.333 6732.833 6620.208 6884.083 7201.667 7142.250
2001 7405.000 7452.500 7470.958 7295.250 6752.583 6347.208 6334.000
2002 5327.875 5338.125 5344.833 5437.167 5593.167 5663.625 5661.000
2003 6455.250 6291.583 6155.000 6006.750 5855.083 5790.375 5799.667
2004 5833.792 5935.833 6172.625 6490.125 6884.292 7256.583 7605.875
2005 9898.500 10264.167 10262.000 10091.875 9833.083 9567.833 9240.708
2006 7962.667 7758.375 7768.333 7864.458 7932.708 7970.167 8059.833
2007 7525.917 7397.458 7250.083 7137.417 7148.375 7143.167 7118.792
2008 7125.375 7067.083 7012.958 6922.125 6736.333 6593.250 6392.042
2009 5340.625 5341.542 5528.958 5934.542 6299.208 6520.000 6772.583
2010 7793.583 7988.292 8137.917 7991.250 7772.875 7632.792 7497.042
2011 6514.500 6445.667 6192.583 6024.750 6026.625 6067.875 6023.417
2012 5792.333 5632.458 5617.667 5559.833 5474.208 5416.167 5513.292
2013 5809.667 5763.542 5638.167 5593.542 5618.708 5634.750 5569.000
2014 5695.583 5769.042 5939.125 6162.333 6263.000 6223.000 NA
Aug Sep Oct Nov Dec
1991 5408.917 5167.917 4969.917 4823.833 4696.083
1992 5355.125 5640.375 6015.042 6166.792 6173.417
1993 6201.208 6169.958 6130.208 6101.792 6001.500
1994 6294.750 6461.000 6437.500 6471.125 6551.792
1995 5956.333 5590.292 5523.542 5541.833 5476.458
1996 4970.917 4884.000 4733.125 4548.917 4575.792
1997 6025.375 6335.625 6633.083 6811.667 6835.167
1998 7751.750 7959.042 7885.958 7854.167 8018.333
1999 7909.958 7561.917 7417.167 7532.333 7498.417
2000 7194.292 7433.625 7506.333 7387.917 7341.292
2001 6078.833 5673.167 5604.833 5565.792 5408.125
2002 5678.458 5987.417 6311.458 6383.958 6461.500
2003 5867.458 5798.917 5746.917 5832.667 5830.250
2004 7948.333 8432.500 8937.958 9178.000 9458.125
2005 8985.333 8797.167 8370.667 8215.000 8199.625
2006 8113.083 7891.500 7726.125 7605.625 7539.292
2007 7236.375 7452.000 7560.292 7511.750 7284.208
2008 5966.542 5537.667 5438.125 5443.625 5406.542
2009 7193.875 7525.333 7411.792 7341.542 7570.125
2010 7196.375 6875.000 6771.958 6700.792 6558.958
2011 5951.417 5944.333 5952.333 5999.917 6006.792
2012 5669.042 5731.250 5862.167 5927.583 5839.500
2013 5540.083 5517.000 5423.875 5476.250 5622.375
2014 NA NA NA NA NA
$random
Jan Feb Mar Apr May
1991 NA NA NA NA NA
1992 -884.129227 449.526570 -2833.647343 -1535.732488 784.666063
1993 -489.254227 -568.556763 1676.144324 -289.357488 238.082729
1994 -1867.420894 913.109903 1233.394324 840.017512 -264.375604
1995 -895.212560 4209.859903 -416.189010 -29.815821 -72.792271
1996 -230.129227 -953.390097 -659.355676 1694.017512 1347.291063
1997 -1089.254227 -1767.181763 -2013.772343 -815.190821 -938.375604
1998 -568.379227 -262.765097 561.602657 205.017512 -1267.917271
1999 -618.629227 1979.276570 -143.897343 -3764.190821 -536.083937
2000 532.162440 147.526570 -3392.397343 -423.232488 2154.624396
2001 -1299.879227 2733.359903 -1395.522343 -2088.274155 434.124396
2002 1481.245773 -1980.265097 -2177.397343 1037.809179 -611.458937
2003 -304.129227 -1856.723430 3350.435990 1907.225845 -572.375604
2004 951.329106 -507.973430 694.810990 2813.850845 -933.583937
2005 2219.620773 -1950.306763 5339.435990 2609.100845 -1518.375604
2006 -2075.545894 657.484903 3215.102657 -781.482488 2263.999396
2007 651.204106 6.401570 -572.647343 282.559179 -180.667271
2008 1146.745773 3063.776570 2112.477657 648.850845 -1084.625604
2009 -309.504227 -2181.681763 274.477657 2569.434179 -1448.500604
2010 1.537440 2518.568237 -1726.480676 -2820.274155 -1275.167271
2011 105.620773 -1979.806763 -1453.147343 -1654.774155 -435.917271
2012 -148.212560 -1918.598430 -296.230676 -1579.857488 1648.499396
2013 2593.454106 -1070.681763 197.269324 1014.434179 445.999396
2014 857.537440 79.818237 -1813.689010 -79.357488 1583.707729
Jun Jul Aug Sep Oct
1991 NA -1532.239734 -36.107488 -473.187198 947.437802
1992 1672.122585 -54.739734 191.684179 218.354469 433.312802
1993 -238.669082 2066.385266 563.600845 -92.228865 -1147.853865
1994 -1078.502415 -1075.656401 -1417.940821 1703.729469 7.854469
1995 -493.210749 -21.906401 956.475845 -694.562198 -125.187198
1996 -941.335749 -1003.906401 -1126.107488 2012.729469 1284.229469
1997 1238.330918 1180.718599 558.434179 1234.104469 702.271135
1998 -1449.919082 -3874.364734 2365.059179 2394.687802 4.396135
1999 -47.419082 346.843599 3380.850845 27.812802 2057.187802
2000 -1197.544082 -672.239734 -1271.482488 -1591.895531 1013.021135
2001 389.914251 932.010266 187.975845 267.562802 -1401.478865
2002 -505.502415 1258.010266 1181.350845 -478.687198 539.896135
2003 927.747585 -590.656401 -1225.649155 -1350.187198 -1393.562198
2004 -1264.460749 -1585.864734 -1668.524155 61.229469 -1009.603865
2005 783.289251 2989.301932 -139.524155 -2921.437198 -1907.312198
2006 129.955918 734.176932 -734.274155 -309.770531 -661.770531
2007 2593.955918 -282.781401 -982.565821 -2282.270531 -787.937198
2008 -1001.127415 776.968599 -1444.732488 65.062802 -1278.770531
2009 -1016.877415 -1096.573068 -1157.065821 1060.396135 3498.562802
2010 1709.330918 -297.031401 1989.434179 2152.729469 176.396135
2011 777.247585 2606.593599 152.392512 91.396135 -39.978865
2012 61.955918 -663.281401 377.767512 6.479469 -1039.812198
2013 -1212.627415 -378.989734 -940.274155 -1341.270531 -110.520531
2014 -75.877415 NA NA NA NA
Nov Dec
1991 -891.065821 446.022947
1992 -381.024155 -53.310386
1993 -571.024155 -510.393720
1994 -960.357488 772.314614
1995 -959.065821 179.647947
1996 -430.149155 753.314614
1997 -76.899155 358.939614
1998 2157.600845 156.772947
1999 -1465.565821 -976.310386
2000 5966.850845 -485.185386
2001 -919.024155 426.981280
2002 -641.190821 -31.393720
2003 -1231.899155 188.856280
2004 1307.767512 -389.018720
2005 -2475.232488 -750.518720
2006 -828.857488 -228.185386
2007 -179.982488 -653.102053
2008 42.142512 -363.435386
2009 145.225845 770.981280
2010 -493.024155 -300.852053
2011 1288.850845 160.314614
2012 396.184179 -100.393720
2013 960.517512 388.731280
2014 NA NA
$figure
[1] -2522.1208 -447.8599 3311.5640 3406.0242 260.2923 133.8774
[7] 3400.9897 2244.1908 -955.7295 -2246.3545 -2532.7675 -4052.1063
$type
[1] "additive"
attr(,"class")
[1] "decomposed.ts"
plot(components_monthly)
fit_monthly <- auto.arima(monthly_arima, trace = TRUE, test = "kpss", ic = "bic")
Fitting models using approximations to speed things up...
ARIMA(2,0,2)(1,1,1)[12] with drift : Inf
ARIMA(0,0,0)(0,1,0)[12] with drift : 4940.673
ARIMA(1,0,0)(1,1,0)[12] with drift : 4820.157
ARIMA(0,0,1)(0,1,1)[12] with drift : 4780.66
ARIMA(0,0,0)(0,1,0)[12] : 4935.076
ARIMA(0,0,1)(0,1,0)[12] with drift : 4905.171
ARIMA(0,0,1)(1,1,1)[12] with drift : 4792.821
ARIMA(0,0,1)(0,1,2)[12] with drift : 4785.781
ARIMA(0,0,1)(1,1,0)[12] with drift : 4835.391
ARIMA(0,0,1)(1,1,2)[12] with drift : 4789.521
ARIMA(0,0,0)(0,1,1)[12] with drift : 4823.973
ARIMA(1,0,1)(0,1,1)[12] with drift : 4757.861
ARIMA(1,0,1)(0,1,0)[12] with drift : 4893.217
ARIMA(1,0,1)(1,1,1)[12] with drift : 4762.762
ARIMA(1,0,1)(0,1,2)[12] with drift : 4763.454
ARIMA(1,0,1)(1,1,0)[12] with drift : 4810.19
ARIMA(1,0,1)(1,1,2)[12] with drift : 4766.262
ARIMA(1,0,0)(0,1,1)[12] with drift : 4766.611
ARIMA(2,0,1)(0,1,1)[12] with drift : 4764.112
ARIMA(1,0,2)(0,1,1)[12] with drift : 4762.623
ARIMA(0,0,2)(0,1,1)[12] with drift : 4780.193
ARIMA(2,0,0)(0,1,1)[12] with drift : 4767.588
ARIMA(2,0,2)(0,1,1)[12] with drift : 4767.744
ARIMA(1,0,1)(0,1,1)[12] : 4753.056
ARIMA(1,0,1)(0,1,0)[12] : 4887.636
ARIMA(1,0,1)(1,1,1)[12] : 4757.169
ARIMA(1,0,1)(0,1,2)[12] : 4758.612
ARIMA(1,0,1)(1,1,0)[12] : 4804.576
ARIMA(1,0,1)(1,1,2)[12] : 4760.879
ARIMA(0,0,1)(0,1,1)[12] : 4775.902
ARIMA(1,0,0)(0,1,1)[12] : 4761.7
ARIMA(2,0,1)(0,1,1)[12] : 4759.357
ARIMA(1,0,2)(0,1,1)[12] : 4757.868
ARIMA(0,0,0)(0,1,1)[12] : 4819.725
ARIMA(0,0,2)(0,1,1)[12] : 4775.225
ARIMA(2,0,0)(0,1,1)[12] : 4762.753
ARIMA(2,0,2)(0,1,1)[12] : 4763.001
Now re-fitting the best model(s) without approximations...
ARIMA(1,0,1)(0,1,1)[12] : Inf
ARIMA(1,0,1)(1,1,1)[12] : Inf
ARIMA(1,0,1)(0,1,1)[12] with drift : Inf
ARIMA(1,0,2)(0,1,1)[12] : Inf
ARIMA(1,0,1)(0,1,2)[12] : Inf
ARIMA(2,0,1)(0,1,1)[12] : Inf
ARIMA(1,0,1)(1,1,2)[12] : Inf
ARIMA(1,0,0)(0,1,1)[12] : Inf
ARIMA(1,0,2)(0,1,1)[12] with drift : Inf
ARIMA(2,0,0)(0,1,1)[12] : Inf
ARIMA(1,0,1)(1,1,1)[12] with drift : Inf
ARIMA(2,0,2)(0,1,1)[12] : Inf
ARIMA(1,0,1)(0,1,2)[12] with drift : Inf
ARIMA(2,0,1)(0,1,1)[12] with drift : Inf
ARIMA(1,0,1)(1,1,2)[12] with drift : Inf
ARIMA(1,0,0)(0,1,1)[12] with drift : Inf
ARIMA(2,0,0)(0,1,1)[12] with drift : Inf
ARIMA(2,0,2)(0,1,1)[12] with drift : Inf
ARIMA(0,0,2)(0,1,1)[12] : Inf
ARIMA(0,0,1)(0,1,1)[12] : Inf
ARIMA(0,0,2)(0,1,1)[12] with drift : Inf
ARIMA(0,0,1)(0,1,1)[12] with drift : Inf
ARIMA(0,0,1)(0,1,2)[12] with drift : Inf
ARIMA(0,0,1)(1,1,2)[12] with drift : Inf
ARIMA(0,0,1)(1,1,1)[12] with drift : Inf
ARIMA(1,0,1)(1,1,0)[12] : 4988.73
Best model: ARIMA(1,0,1)(1,1,0)[12]
fit_monthly
Series: monthly_arima
ARIMA(1,0,1)(1,1,0)[12]
Coefficients:
ar1 ma1 sar1
0.8466 -0.5434 -0.5353
s.e. 0.0564 0.0883 0.0497
sigma^2 estimated as 3802460: log likelihood=-2483.12
AIC=4974.25 AICc=4974.4 BIC=4988.73
confint(fit_monthly)
2.5 % 97.5 %
ar1 0.7361480 0.9571085
ma1 -0.7164356 -0.3703286
sar1 -0.6326515 -0.4379326
fires_small %>%
group_by(discovery_date) %>%
summarise(num_fires = n()) %>%
ggplot +
aes(x = discovery_date, y = num_fires) +
geom_line(col = "dark blue")
This shows a typical time series plot with a cyclic variation due to warmer weather in the summer time.
fires_small %>%
group_by(discovery_doy) %>%
summarise(num_fires = n()) %>%
ggplot(aes(x = discovery_doy, y = num_fires)) +
geom_line(col = "dark blue")
The are peaks around day 60-110 and a big peak around 180.
fires_small %>%
group_by(discovery_doy) %>%
summarise(num_fires = n()) %>%
arrange(desc(num_fires))
`summarise()` ungrouping output (override with `.groups` argument)
The 2 highest days of the year are on 185 and 186, which happens to be Independence Day (4th July) on a normal year and a leap year retrospectively. So I imagine most of the extra fires (literally over double the normal amount) are caused by fireworks.
fires_small %>%
group_by(discovery_moy) %>%
summarise(num_fires = n()) %>%
ggplot(aes(x = discovery_moy, y = num_fires)) +
geom_col(fill = "dark blue", col = "white")
`summarise()` ungrouping output (override with `.groups` argument)
There are 2 definite peaks during the year. March and April are probably due to the US “Spring Break”, where schools and Universities are stopped and so families are likely to be on vacation during that period possibly visiting National Parks. July and August is also Summer Break for school with both families visiting Parks and hot weather likely causes of fire outbreaks.
options(scipen = 999)
fires_small %>%
group_by(stat_cause_descr) %>%
summarise(num_fires = n()) %>%
ggplot +
aes(reorder(x = stat_cause_descr, num_fires), y = num_fires) +
geom_col(fill = "dark blue") +
coord_flip()
`summarise()` ungrouping output (override with `.groups` argument)
fires_small %>%
group_by(stat_cause_descr) %>%
summarise(avg_size = mean(fire_size)) %>%
ggplot +
aes(reorder(x = stat_cause_descr, avg_size), y = avg_size) +
geom_col(fill = "dark blue") +
coord_flip()
`summarise()` ungrouping output (override with `.groups` argument)
fires_small %>%
summarise(num_na = sum(is.na(cont_date)))
Literally half the data is missing for burn time, making it very difficult to do any meaningful analysis
fires_small %>%
group_by(fire_size_class) %>%
summarise(num_fires = n()) %>%
ggplot +
aes(x = fire_size_class, y = num_fires, fill = fire_size_class) +
geom_col() +
scale_fill_manual(values = c("red", "orange", "yellow", "green", "blue",
"purple", "black"),
name = "Fire Size Classification",
breaks = c("A", "B", "C", "D", "E", "F", "G"),
labels = c("A: < 1/4 acre", "B: 1/4 to 10 acres", "C: 10 to 100 acres",
"D: 100 to 300 acres", "E: 300 to 1000 acres",
"F: 1000 to 5000 acres", "G: More than 5000 acres"))
`summarise()` ungrouping output (override with `.groups` argument)
geom_polygon(), coord_map() along with the ggthemes theme_map() functions.datasets package which includes various bits of information on the US States, including coordinates for state boundaries.# State boundary co-ordinates from 'datasets' package
state_map <- map_data("state")
state_map
state.abb
[1] "AL" "AK" "AZ" "AR" "CA" "CO" "CT" "DE" "FL" "GA" "HI" "ID" "IL" "IN" "IA"
[16] "KS" "KY" "LA" "ME" "MD" "MA" "MI" "MN" "MS" "MO" "MT" "NE" "NV" "NH" "NJ"
[31] "NM" "NY" "NC" "ND" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN" "TX" "UT" "VT"
[46] "VA" "WA" "WV" "WI" "WY"
state.name
[1] "Alabama" "Alaska" "Arizona" "Arkansas"
[5] "California" "Colorado" "Connecticut" "Delaware"
[9] "Florida" "Georgia" "Hawaii" "Idaho"
[13] "Illinois" "Indiana" "Iowa" "Kansas"
[17] "Kentucky" "Louisiana" "Maine" "Maryland"
[21] "Massachusetts" "Michigan" "Minnesota" "Mississippi"
[25] "Missouri" "Montana" "Nebraska" "Nevada"
[29] "New Hampshire" "New Jersey" "New Mexico" "New York"
[33] "North Carolina" "North Dakota" "Ohio" "Oklahoma"
[37] "Oregon" "Pennsylvania" "Rhode Island" "South Carolina"
[41] "South Dakota" "Tennessee" "Texas" "Utah"
[45] "Vermont" "Virginia" "Washington" "West Virginia"
[49] "Wisconsin" "Wyoming"
state_list <- tibble(state = state.abb, state_name = state.name)
state_list
state_map dataframe is in lower case and has the column name ‘region’. I shall change the state_list tibble to be the same format so they can be joined together.state_list <- tibble(state = state.abb, region = tolower(state.name))
state_list to fires_small datasetsfires_states <- fires_small %>%
left_join(state_list, by = "state")
fires_states
fires_states %>%
filter(is.na(region))
states_list tibble.States tibble originally. PR is Puerto Rico and is also not a state but the largest US territory .# Adding 2 new states
state.abb <- append(state.abb, c("DC", "PR"))
state.name <- append(state.name, c("District of Columbia", "Puerto Rico"))
state_list <- tibble(state = state.abb, region = tolower(state.name))
# Re-joing tibbles
fires_states <- fires_small %>%
left_join(state_list, by = "state")
# Checking the join has worked properly and there are no NAs
fires_states %>%
filter(is.na(region))
Warning in `[<-.data.frame`(`*tmp*`, is_list, value = list(`23` = "<S3: blob>")) :
replacement element 1 has 1 row to replace 0 rows
# Code below brings up a "vector memory exhausted (limit reached?)" error
# fires_joined <- fires_states %>%
# right_join(state_map, by = "region")
fires_joined <- fires_states %>%
select(region) %>%
group_by(region) %>%
summarise(num_fires = n()) %>%
right_join(state_map, by = "region")
`summarise()` ungrouping output (override with `.groups` argument)
Result!! Now doing first geo spatial visualisation
fires_joined %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = num_fires)) +
geom_polygon() +
geom_path(color = "white") +
scale_fill_continuous(low = "darkblue",
high = "darkred",
name = "Number of fires") +
theme_map() +
coord_map("mollweide") +
ggtitle("Total US Wildfires from 1992-2015") +
theme(plot.title = element_text(hjust = 0.5))
fires_states %>%
distinct(stat_cause_descr) %>%
arrange(-desc(stat_cause_descr))
fires_states %>%
select(stat_cause_descr) %>%
group_by(stat_cause_descr) %>%
summarise(num_fires = n ()) %>%
arrange(desc(num_fires))
`summarise()` ungrouping output (override with `.groups` argument)
NA
fires_states %>%
select(region) %>%
group_by(region) %>%
summarise(num_fires = n()) %>%
arrange(desc(num_fires))
`summarise()` ungrouping output (override with `.groups` argument)
# Function for plotting cause of fire
cause <- function(cause) {
fires_states %>%
filter(stat_cause_descr == cause) %>%
select(region) %>%
group_by(region) %>%
summarise(num_fires = n ()) %>%
right_join(state_map, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = num_fires)) +
geom_polygon() +
geom_path(color = "white") +
scale_fill_continuous(low = "darkblue",
high = "darkred",
name = "Number of fires") +
theme_map() +
coord_map("mollweide") +
ggtitle(paste0("Total US Wildfires caused by ", cause, " from 1992-2015")) +
theme(plot.title = element_text(hjust = 0.5))
}
cause("Arson")
`summarise()` ungrouping output (override with `.groups` argument)
Arson does seem more prevalent in the SE states of Mississippi, Georgia, Alabama and also the western state of California.
cause("Campfire")
`summarise()` ungrouping output (override with `.groups` argument)
Campfires are the most prevalent in the Western states of Oregon, California and Arizona.
cause("Children")
`summarise()` ungrouping output (override with `.groups` argument)
Fires by children are spread about the country, but the most prevalent states are California in the West, Alabama and South Carolina and New Jersey in the east.
cause("Debris Burning")
`summarise()` ungrouping output (override with `.groups` argument)
Fires by burning debris are mostly in the southern warmer states of Texas, Georgia and North Carolina.
cause("Equipment Use")
`summarise()` ungrouping output (override with `.groups` argument)
Most of the fires caused by equipment seem to be in California
cause("Fireworks")
`summarise()` ungrouping output (override with `.groups` argument)
Most of the fires caused by fireworks seem to be in the north of the country. Primarily South Dakota, Montana and Washington state.
cause("Lightning")
`summarise()` ungrouping output (override with `.groups` argument)
Apart from a hotspot of lightning strikes in Florida, the vast majority of fires caused by lightning are in the West of the country. With the 3 most affected states being California, Oregon and Arizona.
cause("Miscellaneous")
`summarise()` ungrouping output (override with `.groups` argument)
There seems to be quite a few miscellaneous classifications in California, Texas and New York.
cause("Missing/Undefined")
`summarise()` ungrouping output (override with `.groups` argument)
The states with the most missing or undefined data is North and South Carolina, Oklahoma and California.
cause("Powerline")
`summarise()` ungrouping output (override with `.groups` argument)
Texas has the largest amount of wildfires caused by powerlines. This is likely due to the warm climate and the large proportion of the state that is dry grasslands used for agriculture. (1)
cause("Railroad")
`summarise()` ungrouping output (override with `.groups` argument)
By far Florida has the most wildfires caused by railroads.
cause("Smoking")
`summarise()` ungrouping output (override with `.groups` argument)
Fires caused by smoking seem to be spread around the country, but mainly on the east and west coasts.
cause("Structure")
`summarise()` ungrouping output (override with `.groups` argument)
South Dakota has the largest proportion of fires caused by structures.
dataset package also has the area in square miles of each state included in the state.area vector.state.area
[1] 51609 589757 113909 53104 158693 104247 5009 2057 58560 58876
[11] 6450 83557 56400 36291 56290 82264 40395 48523 33215 10577
[21] 8257 58216 84068 47716 69686 147138 77227 110540 9304 7836
[31] 121666 49576 52586 70665 41222 69919 96981 45333 1214 31055
[41] 77047 42244 267339 84916 9609 40815 68192 24181 56154 97914
length(state.area)
[1] 50
(Area figures obtained from Wikipedia)
DC = 68 miles^2 PR = 3515 miles^2
# To make my life easier I'm going to remove the state.abb and .name files and make the tibble again, adding in the land area figures at the same time to make sure they are in the correct order.
rm(state.abb)
rm(state.name)
state.abb <- append(state.abb, c("DC", "PR"))
state.name <- append(state.name, c("District of Columbia", "Puerto Rico"))
state.area <- append(state.area, c("68", "3515"))
state_list <- tibble(state = state.abb, region = tolower(state.name), area = as.numeric(state.area))
# Re-joining tibbles
fires_states <- fires_small %>%
left_join(state_list, by = "state")
fires_states %>%
select(region, area) %>%
group_by(region, area) %>%
summarise(num_fires = n()) %>%
mutate(fires_sqmile = num_fires / area) %>%
arrange(desc(fires_sqmile))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
fires_states %>%
select(region, area) %>%
group_by(region, area) %>%
summarise(num_fires = n()) %>%
mutate(fires_sqmile = num_fires / area) %>%
right_join(state_map, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = fires_sqmile)) +
geom_polygon() +
geom_path(color = "white") +
scale_fill_distiller(name = "Fire per Sq Mile", palette = "PuBuGn") +
theme_map() +
coord_map("mollweide") +
ggtitle(paste0("Total US Wildfires per Square Mile from 1992-2015")) +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Puerto Rico is not shown on this map, but visually we can see the data for the other 51 entries, and the south eastern states still have the highest proportion of wildfires. Interestingly New Jersey also shows has a hotspot in the NE of the country.
fires_states %>%
select(stat_cause_descr, fire_year) %>%
group_by(fire_year, stat_cause_descr) %>%
filter(stat_cause_descr == "Arson" | stat_cause_descr == "Campfire" |
stat_cause_descr == "Children" | stat_cause_descr == "Equipment Use" |
stat_cause_descr == "Fireworks" | stat_cause_descr == "Smoking") %>%
summarise(num_fires = n()) %>%
ggplot +
aes(x = fire_year, y = num_fires, colour = stat_cause_descr) +
geom_line()
`summarise()` regrouping output by 'fire_year' (override with `.groups` argument)
The 2 large peaks in Arson are obvious in 1999 and 2006. There was a large heatwave in 2006, but I’m not sure why this would result in an increase in arson. Unless this was just due to the dry ground creating extra fuel to aid the spread of fires that would have normally not resulted in a large scale fire. This may also be the same reason that there is also another peak in 2006 for Equipment Use. Arson however does look to be decreasing since 2006.
fires_states %>%
select(stat_cause_descr, fire_year) %>%
group_by(fire_year, stat_cause_descr) %>%
filter(stat_cause_descr == "Debris Burning" | stat_cause_descr == "Lightning" |
stat_cause_descr == "Miscellaneous" | stat_cause_descr ==
"Missing/Undefined" | stat_cause_descr == "Powerline" |
stat_cause_descr == "Railroad" | stat_cause_descr == "Structure") %>%
summarise(num_fires = n()) %>%
ggplot +
aes(x = fire_year, y = num_fires, colour = stat_cause_descr) +
geom_line()
`summarise()` regrouping output by 'fire_year' (override with `.groups` argument)
Similar peaks can be seen in Debris, Miscellaneous and lightning in the heatwave of 2006 that left the ground very dry. There are peaks from 1997 to 2003 in debris, miscellaneous and lightening, but also a trough in missing/undefined, so this is likely to be due to more accurate classification of fires and not using the missing/undefined category as much.
state_map_southern <- state_map %>%
filter(region == "florida" | region == "georgia" | region == "alabama" |
region == "mississippi" | region == "south carolina" |
region == "north carolina" | region == "tennessee" |
region == "arkansas" | region == "louisiana")
fires_states %>%
filter(fire_year == "1992" | fire_year == "1993" | fire_year == "1994" |
fire_year == "1995") %>%
filter(region == "florida" | region == "georgia" | region == "alabama" |
region == "mississippi" | region == "south carolina" |
region == "north carolina" | region == "tennessee" |
region == "arkansas" | region == "louisiana") %>%
select(region, stat_cause_descr) %>%
group_by(region, stat_cause_descr) %>%
summarise(num_fire = n()) %>%
top_n(1) %>%
right_join(state_map_southern, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = stat_cause_descr)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_brewer(name = "Cause of Fires", palette = "PuBuGn") +
ggtitle("Total US Wildfires main cause from 1992-1995") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Selecting by num_fire
fires_states %>%
filter(fire_year == "1996" | fire_year == "1997" | fire_year == "1998" |
fire_year == "1999") %>%
filter(region == "florida" | region == "georgia" | region == "alabama" |
region == "mississippi" | region == "south carolina" |
region == "north carolina" | region == "tennessee" |
region == "arkansas" | region == "louisiana") %>%
select(region, stat_cause_descr) %>%
group_by(region, stat_cause_descr) %>%
summarise(num_fire = n()) %>%
top_n(1) %>%
right_join(state_map_southern, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = stat_cause_descr)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_brewer(name = "Cause of Fires", palette = "PuBuGn") +
ggtitle("Total US Wildfires main cause from 1996-1999") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Selecting by num_fire
fires_states %>%
filter(fire_year == "2000" | fire_year == "2001" | fire_year == "2002" |
fire_year == "2003") %>%
filter(region == "florida" | region == "georgia" | region == "alabama" |
region == "mississippi" | region == "south carolina" |
region == "north carolina" | region == "tennessee" |
region == "arkansas" | region == "louisiana") %>%
select(region, stat_cause_descr) %>%
group_by(region, stat_cause_descr) %>%
summarise(num_fire = n()) %>%
top_n(1) %>%
right_join(state_map_southern, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = stat_cause_descr)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_brewer(name = "Cause of Fires", palette = "PuBuGn") +
ggtitle("Total US Wildfires main cause from 2000-2003") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Selecting by num_fire
fires_states %>%
filter(fire_year == "2004" | fire_year == "2005" | fire_year == "2006" |
fire_year == "2007") %>%
filter(region == "florida" | region == "georgia" | region == "alabama" |
region == "mississippi" | region == "south carolina" |
region == "north carolina" | region == "tennessee" |
region == "arkansas" | region == "louisiana") %>%
select(region, stat_cause_descr) %>%
group_by(region, stat_cause_descr) %>%
summarise(num_fire = n()) %>%
top_n(1) %>%
right_join(state_map_southern, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = stat_cause_descr)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_brewer(name = "Cause of Fires", palette = "PuBuGn") +
ggtitle("Total US Wildfires main cause from 2004-2007") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Selecting by num_fire
fires_states %>%
filter(fire_year == "2008" | fire_year == "2009" | fire_year == "2010" |
fire_year == "2011") %>%
filter(region == "florida" | region == "georgia" | region == "alabama" |
region == "mississippi" | region == "south carolina" |
region == "north carolina" | region == "tennessee" |
region == "arkansas" | region == "louisiana") %>%
select(region, stat_cause_descr) %>%
group_by(region, stat_cause_descr) %>%
summarise(num_fire = n()) %>%
top_n(1) %>%
right_join(state_map_southern, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = stat_cause_descr)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_brewer(name = "Cause of Fires", palette = "PuBuGn") +
ggtitle("Total US Wildfires main cause from 2008-2011") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Selecting by num_fire
fires_states %>%
filter(fire_year == "2012" | fire_year == "2013" | fire_year == "2014" |
fire_year == "2015") %>%
filter(region == "florida" | region == "georgia" | region == "alabama" |
region == "mississippi" | region == "south carolina" |
region == "north carolina" | region == "tennessee" |
region == "arkansas" | region == "louisiana") %>%
select(region, stat_cause_descr) %>%
group_by(region, stat_cause_descr) %>%
summarise(num_fire = n()) %>%
top_n(1) %>%
right_join(state_map_southern, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = stat_cause_descr)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_brewer(name = "Cause of Fires", palette = "PuBuGn") +
ggtitle("Total US Wildfires main cause from 2012-2015") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Selecting by num_fire
Looking at these trends some interesting insights can be seen. For the combined years data Florida stands out as having railroad as its main cause of wildfire, but from the above plots it can be seen that these railroad fires are only the main cause up to the 4 yearly period ending in 2003 and then the main cause changes to lightning until the end of the collection period in 2015. Similarly arson seem reasonably popular in the southern states until 2007, when it no longer appears as the most common cause of wildfire. This downward trend was also noted earlier in the overall causation plots for all states
fires_states %>%
select(region, fire_size_class) %>%
group_by(region, fire_size_class) %>%
summarise(num_fire = n()) %>%
top_n(1) %>%
right_join(state_map, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = fire_size_class)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_brewer(name = "Fire Size Class", palette = "PuBuGn") +
ggtitle("Most common wildfire size per State 1992-2015") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Selecting by num_fire
fires_states %>%
select(region, fire_size_class) %>%
filter(fire_size_class == "G") %>%
group_by(region) %>%
summarise(num_fire = n()) %>%
right_join(state_map, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = num_fire)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_distiller(name = "Number of Fires", palette = "PuBuGn") +
ggtitle("Number of large class G fires per State 1992-2015") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` ungrouping output (override with `.groups` argument)
From the plots we can see that the Western states have the most small fires and also the most large fires! Not entirely the most helpful plots…
fires_states %>%
filter(fire_year == "1992" | fire_year == "1993" | fire_year == "1994" |
fire_year == "1995") %>%
select(region, discovery_moy) %>%
group_by(region, discovery_moy) %>%
summarise(num_fire = n()) %>%
top_n(1) %>%
right_join(state_map, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = discovery_moy)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_brewer(name = "Months of Year", palette = "PuBuGn") +
ggtitle("Month with most fires per State 1992-1995") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Selecting by num_fire
fires_states %>%
filter(fire_year == "1996" | fire_year == "1997" | fire_year == "1998" |
fire_year == "1999") %>%
select(region, discovery_moy) %>%
group_by(region, discovery_moy) %>%
summarise(num_fire = n()) %>%
top_n(1) %>%
right_join(state_map, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = discovery_moy)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_brewer(name = "Months of Year", palette = "PuBuGn") +
ggtitle("Month with most fires per State 1996-1999") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Selecting by num_fire
fires_states %>%
filter(fire_year == "2000" | fire_year == "2001" | fire_year == "2002" |
fire_year == "2003") %>%
select(region, discovery_moy) %>%
group_by(region, discovery_moy) %>%
summarise(num_fire = n()) %>%
top_n(1) %>%
right_join(state_map, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = discovery_moy)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_brewer(name = "Months of Year", palette = "PuBuGn") +
ggtitle("Month with most fires per State 2000-2003") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Selecting by num_fire
fires_states %>%
filter(fire_year == "2004" | fire_year == "2005" | fire_year == "2006" |
fire_year == "2007") %>%
select(region, discovery_moy) %>%
group_by(region, discovery_moy) %>%
summarise(num_fire = n()) %>%
top_n(1) %>%
right_join(state_map, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = discovery_moy)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_brewer(name = "Months of Year", palette = "PuBuGn") +
ggtitle("Month with most fires per State 2004-2007") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Selecting by num_fire
fires_states %>%
filter(fire_year == "2008" | fire_year == "2009" | fire_year == "2010" |
fire_year == "2011") %>%
select(region, discovery_moy) %>%
group_by(region, discovery_moy) %>%
summarise(num_fire = n()) %>%
top_n(1) %>%
right_join(state_map, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = discovery_moy)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_brewer(name = "Months of Year", palette = "PuBuGn") +
ggtitle("Month with most fires per State 2008-2011") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Selecting by num_fire
fires_states %>%
filter(fire_year == "2012" | fire_year == "2013" | fire_year == "2014" |
fire_year == "2015") %>%
select(region, discovery_moy) %>%
group_by(region, discovery_moy) %>%
summarise(num_fire = n()) %>%
top_n(1) %>%
right_join(state_map, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = discovery_moy)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_brewer(name = "Months of Year", palette = "PuBuGn") +
ggtitle("Month with most fires per State 2012-2015") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Selecting by num_fire
fires_states %>%
filter(fire_year == "2010" | fire_year == "2011" | fire_year == "2012" |
fire_year == "2013" | fire_year == "2014" | fire_year == "2015") %>%
select(region, discovery_moy) %>%
group_by(region, discovery_moy) %>%
summarise(num_fire = n()) %>%
top_n(1) %>%
right_join(state_map, by = "region") %>%
ggplot +
(aes(x = long, y = lat, group = group, fill = discovery_moy)) +
geom_polygon() +
geom_path(color = "white") +
theme_map() +
scale_fill_brewer(name = "Months of Year", palette = "PuBuGn") +
ggtitle("Month with most fires in per State 2010-2015") +
theme(plot.title = element_text(hjust = 0.5))
`summarise()` regrouping output by 'region' (override with `.groups` argument)
Selecting by num_fire
The above plots are quite interesting. The months of the year that have the most seems to widely change in certain state. Mainly the east half of the country have the most fires in the Spring (Feb-May) and the western part of the country have the most fires later on in Summer and Fall (Jun-Oct). There are however a few exceptions that can be seen in the 2004-2007 and 2008-2011 data Texas has the most fires in January. Florida also mostly conformed to the East/West split with the majority of its worst months for fires taking place in March or April up until 2007, then the most common month moves later into June and July for the rest of the reporting period until 2015. This may have to due with main cause of fires in Florida changing from railroad to lightning related about the same time, as we noted earlier on when looking at causation. As July is the main month for tropical storms and lightning in Florida this is a possible cause for the highest month becoming later in the year than before. (2)